{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 8: Tools and best practices for mathematical programming\n", "**References:**\n", "* [General conventions on programming in SageMath](https://doc.sagemath.org/html/en/developer/coding_basics.html)\n", "* [PEP 8 style guide for Python code](https://peps.python.org/pep-0008/)\n", "* [flake8](https://flake8.pycqa.org/en/latest/) for checking coding conventions, and [autopep8](https://pypi.org/project/autopep8/) for automated code cleanup\n", "* [Debugging code in Python](http://scipy-lectures.org/advanced/debugging/)\n", "* [Code optimization in Python](https://people.duke.edu/~ccc14/sta-663-2017/10A_CodeOptimization.html)\n", "\n", "\n", "**Summary:**
\n", "In this lecture, we start with discussing the best **formats to store code** and good **coding styles**. Then we learn about some tools to **debug** your code (i.e. find mistakes) and how raise and interpret **exceptions**. We show how to write good **documentation** for your code and how you can use it to **test** whether it does what you want. Finally, we see how you can use a **profiler** to find the slow portions of a program, and some basic strategies for speeding things up.\n", "\n", "In the appendix, you find a short crash course on how to use the **console** (which is needed e.g. for the tests mentioned above)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "Until now, the emphasis of the lecture has been on **performing concrete computations**. In the course of this, we have already seen that it can be useful to write functions and classes that encapsulate algorithms and sub-computations that we want to call repeatedly. \n", "\n", "The focus for the rest of the course will shift to this activity of **mathematical programming**, i.e. how to develop and share good code, which can then be used in more complicated computations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Formats of storing and sharing code\n", "In the lectures so far we have worked with **Jupyter notebooks**, and these can be a nice way to share short code snippets and computations. For instance, imagine you thought of the following revolutionary algorithm for computing square roots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def square_root(n):\n", " for m in range(n+1):\n", " if m^2 == n:\n", " return m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then you could create a new notebook ``sqroot.ipynb``, have a cell as above, and even include a cell where you show how to apply the function in practice:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "square_root(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is fine in principle, but can have a few downsides:\n", "* Some people like to work in a console, and they would have to copy-paste your code by hand.\n", "* Others might want to integrate your code into their own programs, again having to copy-paste.\n", "* This problem only gets worse if you have many lines of code.\n", "\n", "A first way of solving this problem is to save the code into a **Sage file**. Let's try this out (if you haven't done so, see the Appendix on using the console below):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "* Create a new file ``sqroot.sage`` in the same folder as the current notebook ``08 Tools and best practices for mathematical programming.ipynb``.\n", "* Open it using a [text editor](https://en.wikipedia.org/wiki/Comparison_of_text_editors#Cross-platform), paste the code of the function ``square_root`` above into the file and save it.\n", "* In the current notebook, restart the kernel via ``Kernel -> Restart`` and convince yourself that afterwards, the SageMath session no longer knows the function ``square_root``.\n", "* Load the content of the file ``sqroot.sage`` via the following cell, and check that afterwards the function ``square_root`` is known by computing $\\sqrt{9}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "load(\"sqroot.sage\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "square_root(9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Such a Sage file is perfectly adequate for working on and sharing code of moderate length. In the next lecture, we will discuss a slightly more professional way of doing this, via **Python files and packages**. We'll talk about the additional features that this brings, but for now, we will stick to Sage files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Addendum**
Answering a question from the audience: it is also possible to \"load\" a Jupyter notebook using the magic command ``%run``. E.g. if the folder containing the current notebook also contains e.g. the file ``04 Analysis and power series.ipynb`` that we worked with before, you can execute all code cells in that notebook using the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%run \"04 Analysis and power series.ipynb\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then you have access to the variables and functions defined there, e.g. the ``parabola``-function we saw at the beginning of lecture 4:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "parabola(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Good coding style\n", "To make code easier to read and understand, there are certain conventions which should be followed. You can have a look at the [summary of conventions for coding in SageMath](https://doc.sagemath.org/html/en/developer/coding_basics.html) or the more detailed [PEP 8 style guide for Python code](https://peps.python.org/pep-0008/).\n", "\n", "The most important points:\n", "* Use 4 spaces for indentation (and no Tab-characters).\n", "* Use a single space before and after lowest priority operators (like ``=, +, *, ...``) in an expression, and after commas:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "i = i + 1\n", "c = (x+i) * (x-i)\n", "v = (3, 2, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Use [``snake_case``](https://en.wikipedia.org/wiki/Snake_case) for function names and [``CamelCase``](https://en.wikipedia.org/wiki/Camel_case) for class names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def my_little_pony():\n", " print('Friendship is magic!')\n", "\n", "class BirdOfPrey(object):\n", " def __init__(self):\n", " self.cloaking = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Repair the following code to comply with standard conventions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def Isprime(n):\n", " if n<=1:\n", " return False\n", " for m in range(2,n):\n", " if ZZ(m).divides(n):\n", " return False\n", " return True " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "```\n", "def is_prime(n):\n", " if n <= 1:\n", " return False\n", " for m in range(2, n):\n", " if ZZ(m).divides(n):\n", " return False\n", " return True\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are some automated checkers like [flake8](https://flake8.pycqa.org/en/latest/) to test if your code satisfies standard conventions, and [autopep8](https://pypi.org/project/autopep8/) to automatically clean up many issues. For now, don't worry too much about it, but if you plan to get serious about writing code (e.g. by contributing to the SageMath development), you should get familiar with some of these conventions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tools for debugging\n", "Now we can write beautiful code, but there might still be an issue: maybe the code does not work! The process of finding errors in code is called [debugging](https://en.wikipedia.org/wiki/Debugging) and there are some good tools helping us do this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Syntax errors\n", "When trying to execute the code, you get an error message, pointing out where the syntax is incorrect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "What goes wrong here?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num = 4\n", "\n", "if num >= 0:\n", " print(\"The square root is \"+repr(square_root(num)))\n", "else\n", " print(\"This number has no square root.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "Missing ``:`` after ``else``:\n", "```\n", "num = 4\n", "\n", "if num >= 0:\n", " print(\"The square root is \"+repr(square_root(num)))\n", "else:\n", " print(\"This number has no square root.\")\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Understanding value and runtime errors using %debug\n", "Sometimes a program is interrupted because at some point in the program an **exception** is raised, e.g. a ``ValueError`` to indicate that the value of a certain variable does not make sense in a given context. You then see the chain of function calls leading to this exception." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def print_inverses_up_to(x):\n", " for n in range(x+1):\n", " print(1/n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print_inverses_up_to(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we can upen a **debugging tool**, allowing us to explore the state of the program when it failed as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%debug" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here to see an example session\n", " \n", "```\n", "> /home/sage/rings/integer.pyx(2087)sage.rings.integer.Integer._div_ (build/cythonized/sage/rings/integer.c:14435)()\n", "\n", "ipdb> u\n", "> (3)print_inverses_up_to()\n", " 1 def print_inverses_up_to(x):\n", " 2 for n in range(x+Integer(1)):\n", "----> 3 print(Integer(1)/n)\n", "\n", "ipdb> p n\n", "0\n", "ipdb> p type(n)\n", "\n", "ipdb> q\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Useful commands:\n", "```\n", "h see help menu\n", "u go one function call up\n", "p print value of variables (can also call functions on these variables and print output)\n", "q quit\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Several things go wrong here. Can you fix them?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def double_of(n):\n", " print('The double of '+repr(n)+' is '+repr(2*n))\n", "\n", "def quadruple_of(m):\n", " return double_of(double_of(n))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "quadruple_of(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "* The argument of ``quadruple_of`` is called ``m``, but the function ``double_of`` is called for ``n``. This defaults to the global function ``n`` (which you use to compute ``n(pi)`` as ``3.1415...``).\n", "* The function ``double_of`` prints a nice statement, but does not return the double of its input (and thus gives back ``None``, the default value when a function ends without having called a ``return``).\n", "\n", "Here is the corrected code:\n", "```\n", "def double_of(n):\n", " print('The double of '+repr(n)+' is '+repr(2*n))\n", " return 2*n\n", "\n", "def quadruple_of(m):\n", " return double_of(double_of(m))\n", "\n", "quadruple_of(5)\n", "> \n", "The double of 5 is 10\n", "The double of 10 is 20\n", "20\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use ``%debug`` before something breaks. By typing ``%debug code_line``, you can start executing a certain line of code in the debugging environment above. Using the commands ``s`` (to execute the next step of the program) and ``n`` (to let the program run until the next line is reached) you can then trace the computation step by step and check the values of variables and order of execution, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%debug quadruple_of(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here to see an example session\n", " \n", "```\n", "NOTE: Enter 'c' at the ipdb> prompt to continue execution.\n", "> (1)()\n", "\n", "ipdb> s\n", "--Call--\n", "> (4)quadruple_of()\n", " 1 def double_of(n):\n", " 2 print('The double of '+repr(n)+' is '+repr(Integer(2)*n))\n", " 3 \n", "----> 4 def quadruple_of(m):\n", " 5 return double_of(double_of(n))\n", "\n", "ipdb> s\n", "> (5)quadruple_of()\n", " 1 def double_of(n):\n", " 2 print('The double of '+repr(n)+' is '+repr(Integer(2)*n))\n", " 3 \n", " 4 def quadruple_of(m):\n", "----> 5 return double_of(double_of(n))\n", "\n", "ipdb> p m\n", "5\n", "ipdb> s\n", "--Call--\n", "> (1)double_of()\n", "----> 1 def double_of(n):\n", " 2 print('The double of '+repr(n)+' is '+repr(Integer(2)*n))\n", " 3 \n", " 4 def quadruple_of(m):\n", " 5 return double_of(double_of(n))\n", "\n", "ipdb> p n\n", "\n", "ipdb> s\n", "> (2)double_of()\n", " 1 def double_of(n):\n", "----> 2 print('The double of '+repr(n)+' is '+repr(Integer(2)*n))\n", " 3 \n", " 4 def quadruple_of(m):\n", " 5 return double_of(double_of(n))\n", "\n", "ipdb> p n\n", "\n", "ipdb> q\n", "```\n", "We can see that when we call ``double_of`` for the first time, its argument is the function ``n`` computing numerical approximations, instead of the expected number ``5``.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Raising exceptions yourself\n", "Note that sometimes it can be good to raise exceptions yourself, since they allow you and others to find errors more easily. Take our good old square root function above, which we now build on to define a fourth root:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def square_root(n):\n", " for m in range(n+1):\n", " if m^2 == n:\n", " return m\n", "\n", "def fourth_root(n):\n", " return square_root(square_root(n))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fourth_root(16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following gives a very confusing error message:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fourth_root(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What goes wrong is that even the square root function for input ``15`` returns ``None``." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "square_root(15)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(square_root(15))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's better to immediately raise an exception when our program does not find a solution:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def square_root(n):\n", " for m in range(n+1):\n", " if m^2 == n:\n", " return m\n", " raise ValueError('n is not a square')\n", "\n", "def fourth_root(n):\n", " return square_root(square_root(n))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fourth_root(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It also allows the programmer of ``fourth_root`` to try alternative solutions if our ``square_root`` function fails:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fourth_root(n):\n", " try:\n", " return square_root(square_root(n))\n", " except ValueError:\n", " return n^(1/4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fourth_root(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a small list of the most common types of exceptions in Python that you would typically include in your own code:\n", "\n", "| Exception | Usage |\n", "| --- | --- | \n", "| ``ValueError`` | a function receives an argument that has the right type but an inappropriate value |\n", "| ``NotImplementedError`` | a function has not been implemented yet (maybe for a particular type of input) |\n", "| ``TypeError`` | a function is applied to an object of inappropriate type (e.g. a ``string`` when an ``int`` is expected) |\n", "| ``AssertionError`` | raised by ``assert condition`` if ``condition`` is ``False``; used for debugging |\n", "| ``RuntimeError`` | an error is detected that does not fall into any of the other categories |\n", "\n", "[Here](https://docs.python.org/3/library/exceptions.html) you find the official Python 3 documentation about exceptions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Documentation and doctests\n", "We have already seen and used many times that we can look at the documentation of a SageMath function by typing something like ``primes?``." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "primes?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this documentation, we get\n", "* a description what the function does,\n", "* a list of its arguments, their types, default values and meanings,\n", "* a block of example computations, showing how to use the function in practice and what the output looks like.\n", "\n", "It is an excellent idea to add documentation to your own functions. This is done by specifying a string (enclosed by ``r\"\"\" ... \"\"\"``) at the beginning of a function definition:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def multiply(a=1, b=1):\n", " r\"\"\"\n", " Return the product of a, b.\n", "\n", " INPUT:\n", " \n", " * \"a\" - integer (default: 1) - first factor\n", " \n", " * \"b\" - integer (default: 1) - second factor\n", "\n", " EXAMPLES::\n", "\n", " sage: multiply(2,3)\n", " 6\n", " sage: multiply(0,-2)\n", " 0\n", " sage: multiply(8)\n", " 8\n", " sage: multiply()\n", " 1\n", " \"\"\"\n", " return a*b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some remarks:\n", "* In very simple cases (as above) the ``INPUT`` section can be omitted.\n", "* The ``EXAMPLES`` section is indented and should look like copy-pasting the text of a SageMath session in a console (hence the ``sage:`` prompts). In particular, running such a session is a good way to generate this part! Also, it is important to be creative with possible inputs, exploring some edge cases (like ``0`` or negative numbers above), and also demonstrating values for optional arguments.\n", "* Many more good practices, further possible sections, etc can be found in [this official guide](https://doc.sagemath.org/html/en/developer/coding_basics.html#documentation-strings)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "* Open again the file ``sqroot.sage`` from the beginning, and add a documentation to it. (*Hint:* As in many situations, it's a good idea to just copy-paste and modify an existing docstring, like from the function ``multiply`` above).\n", "* Load the file and look at your docstring using ``square_root?``." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "Here is a possible docstring:\n", "```\n", "def square_root(n):\n", " r\"\"\"\n", " Return the square root of n.\n", "\n", " INPUT:\n", " \n", " * \"n\" - integer - an integer, assumed to be a square\n", "\n", " EXAMPLES::\n", "\n", " sage: square_root(4)\n", " 2\n", " sage: square_root(0)\n", " 0\n", " sage: square_root(49)\n", " 7\n", " \"\"\" \n", " for m in range(n+1):\n", " if m^2 == n:\n", " return m\n", "```\n", "After saving the changes to ``sqroot.sage``, we load the file again and look at the documentation as follows:\n", "```\n", "load(\"sqroot.sage\")\n", "square_root?\n", "```\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One cool feature of having a documentation with ``EXAMPLE`` section as above, is that you can use it to test your code and make sure that any changes you make do not break some previously working function. This can be done via SageMath's **doctests**. The idea is simple: a program goes through all documentations of functions that you wrote, looks for lines starting with ``sage:`` and runs those lines in SageMath. If the output is not identical to the one specified in the documentation (meaning that the string printed in the terminal is the same), it raises a warning. Let's try it out:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Open a console and navigate to the folder containing the file ``sqroot.sage``, which should contain the function ``square_root`` whose documentation has some ``EXAMPLE`` block as above. Call the command\n", "```\n", "sage -t sqroot.sage\n", "```\n", "and look at the output.\n", "\n", "*Remark:* This assumes that typing ``sage`` in your console starts a SageMath session. Depending on your installation, you should change ``sage`` to ``Sage-9.4`` or ``/home/users/donald_duck/sage/sage`` (i.e. the file in your local installation).\n", "\n", "*Bonus:* Ideally, the above should produce an ouput saying that all tests passed. You can try changing the function ``square_root``, e.g. by inserting a line ``return 42`` at the beginning, and repeat the procedure to check that now the doctests produce errors.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling and optimization\n", "In practial applications, it can sometimes happen that you have some code which produces the correct answer, but which is too slow to be useful. In this section, we'll discuss some tools that can help you find the slow parts of the code, and some possibilities for making things faster.\n", "\n", "We'll begin with an innocent-looking example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def my_fibonacci(n):\n", " if n==0:\n", " return 0\n", " if n<=2:\n", " return 1\n", " return my_fibonacci(n-1)+my_fibonacci(n-2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[my_fibonacci(n) for n in range(10)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a perfectly correct implementation of the Fibonacci numbers. Using the [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html) ``%time`` we can check how it performs relative to the pre-defined function ``fibonacci``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = 20\n", "%time a = my_fibonacci(n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time b = fibonacci(n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a == b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While they give the same answer, the ``fibonacci`` function is much faster, and our handmade version even starts struggling at around ``n=30`` to produce an answer in any reasonable time.\n", "\n", "One way to explore what happens is using the magic ``%prun``, which does a **profiler run** of the command that follows. This means that this command is executed and a record is created, remembering which functions are called how many times, and how long that took. Let's see it in action:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%prun a = my_fibonacci(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that to compute the $20$th Fibonacci number, our function ``my_fibonacci`` is called a total number of $13529$ times! The reason is that ``my_fibonacci`` does not remember when it has been called previously, so in the course of computing ``my_fibonacci(n)``, the value of ``my_fibonacci(n-2)`` is computed *twice*, ``my_fibonacci(n-3)`` is called *three times*, ``my_fibonacci(n-4)`` is called *five times* and so on (do you see the pattern?).\n", "\n", "See [here](http://homepages.math.uic.edu/~jan/mcs320/mcs320notes/lec19.html#figfibfuncallstree) for a nice explanation and picture of this tree of function calls.\n", "\n", "To avoid this, we can use **memoization**: we make the function ``my_fibonacci`` create a list of all previous inputs and outputs. When it is called, it first looks up in this list if it has already computed the accociated value. If yes, the stored value is returned, if not the function code is executed, the answer is stored and then returned by the function. From a practical point of view, we simply have to add a line ``@cached_function`` before our function definition:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cached_function\n", "def my_fibonacci(n):\n", " if n==0:\n", " return 0\n", " if n<=2:\n", " return 1\n", " return my_fibonacci(n-1)+my_fibonacci(n-2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = 900\n", "%time a = my_fibonacci(n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time b = fibonacci(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For optimizing the function ``my_fibonacci`` above, the main important output of ``%prun`` was to show us that the function is called a large number of times without memoization. In more complex examples, it is also important to find out how much computation time is spent in different functions. \n", "\n", "Take the following example case: somewhere in our computation, a large square matrix of numbers in $\\overline{\\mathbb{Q}}$ is created, and we want to compute its characteristic polynomial. However, this seems very slow in practice, so we can start to investigate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n=40\n", "M = matrix(QQbar,[[randint(-100, 100) for i in range(n)] for j in range(n)])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%prun -s cumulative a = M.characteristic_polynomial()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we sorted the output of ``%prun`` by the **cumulative time**. This means: if the method ``_add_`` has a cumulative time of 3.5 seconds, then the computation spends 3.5 seconds in the method ``_add_`` itself, or in methods called from this. Conversely, the **internal time** (listed under ``tottime``) tells us how much time is spent on basic Python operations like ``for`` loops or additions of ``int``s inside the actual code lines of ``_add_``. \n", "\n", "In my experience, cumulative time is almost always more useful to find the slow parts of your code.\n", "\n", "But what to do about the matrix ``M`` above? From the output of ``%prun`` we see that a big chunk of the time is spent on basic addition and multiplication in ``QQbar``. Taking a look at the actual code, we see that the entries of the matrix ``M`` are not arbitrary numbers in ``QQbar`` but in fact integers. Thus it is much faster to compute the characteristic polynomial with integer coefficients (where addition is quick)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N = matrix(ZZ, M)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%prun -s cumulative b = N.characteristic_polynomial()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a == b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Let $n \\geq 1$ and consider the $n \\times n$-matrix $M$ given by\n", "$$\n", "M = \\left(\\frac{\\partial^i}{\\partial x^i} \\frac{\\partial^j}{\\partial y^j} x^{2n} y^{2n} \\big|_{(x,y)=(1,1)} \\right)_{i,j=1, \\ldots, n}\n", "$$\n", "The function ``strange_quantity`` below computes the determinant of ``M`` in dependence of ``n``. Even though the problem seems manageable, the function struggles at ``n=8`` and downright crashes at higher ``n``. Can you find out which parts are slow? Can you see how to make it faster (and not crash)?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def strange_quantity(n):\n", " var('x,y')\n", " f = (x*y)^(2*n)\n", " M = matrix([[f.derivative(x,i,y,j) for i in range(n)] for j in range(n)])\n", " for i in range(n):\n", " for j in range(n):\n", " M[i,j] = M[i,j].subs(x=1, y=1)\n", " return M.determinant()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "Using ``%prun -s cumulative strange_quantity(8)`` we can see that almost all the time is spent in the final computation of the determinant and inside there, the time is spent in ``maxima``, which is the engine handling symbolic expressions in SageMath. A priori this might sound strange: by substituting ``x=y=1``, we would expect that the entries of ``M`` are constants (in fact, Integers). But that's not the case: after ``subs``, the entries of ``M`` are still symbolic expressions. To see this, you can insert a line ``print(type(M[0,0]))`` before the final ``return`` statement, which prints:\n", "```\n", "\n", "```\n", "Thus as before, we can make things faster by making sure we have an integer matrix before computing the determinant. Here is a variant of the code that does this:\n", "```\n", "def strange_quantity(n):\n", " var('x,y')\n", " f = (x*y)^(2*n)\n", " M = matrix([[ZZ(f.derivative(x,i,y,j).subs(x=1,y=1)) for i in range(n)] for j in range(n)])\n", " return M.determinant()\n", "```\n", "Now the code finishes extremely fast (for ``n`` up to 40). Going even higher, it starts getting slow again, and ``%prun`` shows us that now the bottleneck is the ``derivative`` method. If you do need ``strange_quantity`` for higher ``n``, one possibility is to compute a formula for the entries of ``M`` by hand (it's not hard) and program that formula instead of using symbolic derivatives.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many general strategies for writing fast code, e.g.\n", "* avoid unnecessary copy operations\n", "* prefer list comprehensions over ``for`` loops\n", "* prefer generators over lists\n", "* ... (see the references at the beginning of the notebook)\n", "\n", "These are good to learn and apply in the long term. But for a concrete programming problem, it's often already half the solution once you know which part is slow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Programming workflow\n", "Above we learned how to write good code, find mistakes in it (I guess it's not *that* good after all) and speed it up. An important final piece of advice is that *this is the right order* to approach a programming project. Or, to quote from a [guide to code optimization](https://people.duke.edu/~ccc14/sta-663-2017/10A_CodeOptimization.html) linked in the references above, when approaching a programming problem, you should

\n", "\n", "\n", "\n", "1. Make it run\n", "2. Make it right\n", "3. Make it fast\n", "\n", "\n", "\n", "To expand this a bit, a good workflow is to\n", "1. Write a **first version of the program** that should work in principle. It often helps to already start writing the **documentation** here, explaining the precise input and expected output.\n", "2. Try that version and find all **basic coding errors**. \n", "3. Once it runs without errors on some small example, try **testing it for bigger examples** and check whether it actually outputs the correct result there (this often involves computing some small cases by hand). \n", "4. Add these basic examples to the documentation as **doctests**.\n", "5. Now you can see whether the program is fast enough. If not, begin **profiling to work out the bottleneck of the calculation**, and then optimize it.\n", "\n", "In particular, you should *not* do step 5 before 1-4, since you might a) optimize parts of the code that are not actually slow, or b) break the code without noticing it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix: Crash course on consoles\n", "Probably you already used a console to start the Jupyter notebooks. On Linux and Mac OS the program to start a console is often called ``Terminal``, whereas on Windows (if you use the standard Linux installation) there is the program ``SageMath 9.X Shell`` which serves the same purpose. In the console you enter one command at a time, followed by ``Enter``. Here are a few examples:\n", "```\n", "ls list all files and folders in current directory\n", "cd change directory\n", "mkdir make a new directory (i.e. folder)\n", "touch create a new file\n", "```\n", "Here is an example session:\n", "```\n", "0-~> ls\n", "Desktop Documents Downloads Music Pictures Public Templates Videos\n", "0-~> cd Desktop/\n", "0-Desktop> ls\n", "'01 Introduction.ipynb' '05 Algebra.ipynb'\n", "'02 Linear Algebra.ipynb' '06 Combinatorics.ipynb'\n", "'03 Numbers and symbolic expressions.ipynb' my_sage_script.sage\n", "'04 Analysis and power series.ipynb'\n", "0-Desktop> mkdir new_folder\n", "0-Desktop> ls\n", "'01 Introduction.ipynb' '05 Algebra.ipynb'\n", "'02 Linear Algebra.ipynb' '06 Combinatorics.ipynb'\n", "'03 Numbers and symbolic expressions.ipynb' my_sage_script.sage\n", "'04 Analysis and power series.ipynb' new_folder\n", "0-Desktop> cd new_folder/\n", "0-new_folder> ls\n", "0-new_folder> touch new_file.py\n", "0-new_folder> ls\n", "new_file.py\n", "0-new_folder> cd ..\n", "```\n", "Our old friend, ``Tab`` completion also works here, so if you type ``cd Desk``+``Tab`` it will complete the directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Find out what goes wrong with the following code for computing the [body mass index](https://en.wikipedia.org/wiki/Body_mass_index) of a list of patients. This example is taken from\n", "the course [Software Carpentry: Programming with Python.](https://swcarpentry.github.io/python-novice-inflammation/11-debugging/index.html), which has many more nice explanations about debugging." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "patients = [[70, 1.8], [80, 1.9], [150, 1.7]]\n", "\n", "def calculate_bmi(weight, height):\n", " return weight / (height ** 2)\n", "\n", "for patient in patients:\n", " weight, height = patients[0]\n", " bmi = calculate_bmi(height, weight)\n", " print(\"Patient's BMI is: %f\" % bmi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "For some reason, the program always computes the BMI of the first person, and ignores the others, even though we have a ``for`` loop going through all patients. The explanation is that the code *ignores* the variable of the ``for`` loop and always manually looks up the data of the first patient. Thus the line\n", "```\n", " weight, height = patients[0]\n", "```\n", "should be replaced by\n", "```\n", " weight, height = patient\n", "```\n", "The second thing going wrong is that the printed BMIs are very small! This you would notice if you compute some example by hand (e.g. when writing a doctest for your function). Investigating again, we see that when calling the function ``calculate_bmi``, we have switched the order of the arguments. Thus the fully corrected code is:\n", "```\n", "patients = [[70, 1.8], [80, 1.9], [150, 1.7]]\n", "\n", "def calculate_bmi(weight, height):\n", " return weight / (height ** 2)\n", "\n", "for patient in patients:\n", " weight, height = patient\n", " bmi = calculate_bmi(weight, height)\n", " print(\"Patient's BMI is: %f\" % bmi)\n", ">\n", "Patient's BMI is: 21.604938\n", "Patient's BMI is: 22.160665\n", "Patient's BMI is: 51.903114\n", "``` \n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Below is an implementation of the sieve of Eratosthenes for computing the prime numbers up to ``n``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def eratosthenes(n):\n", " M = list(range(2,n+1))\n", " for d in range(2,n+1):\n", " # for d = 2, 3, ... we remove the elements 2d, 3d, ... from M\n", " Mnew = []\n", " for m in M:\n", " if not (m%d == 0 and m//d >= 2):\n", " Mnew.append(m)\n", " M = Mnew\n", " return M" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Write a documentation for the function, including a few doctests.\n", "* Save the function into a Sage file and use a console to run the doctests.\n", "* Unfortunately it takes a long time to finish for ``n=30000``. Can you find out which parts are slow, and write a version which finishes in under 5 seconds for ``n=30000`` (either by improving the code above, or by writing a new one)?
*Hint:* You are also allowed to use your mathematical understanding to improve the algorithm above!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "Here is a reasonable documentation for the function ``eratosthenes``. Since it only has one input variable, we don't need a separate ``INPUT`` section. Note that in the doctests we checked some non-trivial examples, in particular for zero or negative inputs.\n", "```\n", "def eratosthenes(n):\n", " r\"\"\"\n", " Computes the set of prime numbers among 2, ..., n using the sieve\n", " of Eratosthenes.\n", " \n", " EXAMPLES::\n", " \n", " sage: eratosthenes(7)\n", " [2, 3, 5, 7]\n", " sage: eratosthenes(12)\n", " [2, 3, 5, 7, 11]\n", " sage: eratosthenes(1)\n", " []\n", " sage: eratosthenes(-5)\n", " []\n", "\n", " \"\"\"\n", " M = list(range(2,n+1))\n", " ...\n", "```\n", "After saving the code to a Sage file (e.g. ``esieve.sage``) we can run our doctests as follows:\n", "```\n", "$ sage -t esieve.sage\n", "too few successful tests, not using stored timings\n", "Running doctests with ID 2022-04-29-12-05-50-3199b3a7.\n", "Using --optional=bliss,ccache,cmake,coxeter3,dochtml,mcqd,primecount,sage,tdlib\n", "Doctesting 1 file.\n", "sage -t esieve.sage\n", " [4 tests, 0.00 s]\n", "----------------------------------------------------------------------\n", "All tests passed!\n", "----------------------------------------------------------------------\n", "Total time for all tests: 0.5 seconds\n", " cpu time: 0.0 seconds\n", " cumulative wall time: 0.0 seconds\n", "```\n", "\n", "Concerning optimization: using ``%prun`` we can see that a significant part of the time is spent on appending new elements (since in each iteration we build an entirely new list by starting with the empty list and appending the elements we like. In general, it is much faster to use list comprehension, or specialized methods like ``filter`` to create new lists or iterators.\n", "\n", "Also, mathematically we know that it makes no sense to filter out multiples of numbers ``d`` which we already saw are not prime numbers. So it is better to always just filter out the multiples of the smallest number left in our list (which we know must be a prime number since it was not filtered out earlier). Here is some code which finishes reasonably fast:\n", "```\n", "def eratosthenes(n):\n", " r\"\"\"\n", " Computes the set of prime numbers among 2, ..., n using the sieve\n", " of Eratosthenes.\n", " \n", " EXAMPLES::\n", " \n", " sage: eratosthenes(7)\n", " [2, 3, 5, 7]\n", " sage: eratosthenes(12)\n", " [2, 3, 5, 7, 11]\n", " sage: eratosthenes(1)\n", " []\n", " sage: eratosthenes(-5)\n", " []\n", " \"\"\"\n", " M = list(range(2,n+1))\n", " N = []\n", " while M:\n", " d = M.pop(0)\n", " N.append(d)\n", " M = list(filter(lambda m: m%d != 0, M))\n", " return N\n", "```\n", "By repeating the doctests after our changes, we can make sure that our changes did not break the function.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "Here is a program solving random systems of equations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def random_equation_solver(n, m):\n", " # create an nxn matrix of random integers\n", " M = matrix(QQ,[[randint(-100, 100) for i in range(n)] for j in range(n)])\n", " for count in range(m):\n", " # create a random vector of length n\n", " v = vector(QQ, [randint(-100,100) for i in range(n)])\n", " # solve the equation M*x = v\n", " M.solve_right(v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you see how to make it faster for ``(n,m) = (40, 1000)``?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Solution (click to expand)\n", " \n", "Looking at the code, we see that for the created matrix ``M``, it solves linear systems ``M * x = v`` for ``m`` vectors ``v``. For large numbers ``m``, it is faster to once compute the inverse matrix of ``M``, and then solve the system just by multiplying it against ``v``. Thus the following code is indeed faster:\n", "\n", "```\n", "def random_equation_solver2(n, m):\n", " # create an nxn matrix of random integers\n", " M = matrix(QQ,[[randint(-100, 100) for i in range(n)] for j in range(n)])\n", " Minv = M.inverse()\n", " for count in range(m):\n", " # create a random vector of length n\n", " v = vector(QQ, [randint(-100,100) for i in range(n)])\n", " # solve the equation M*x = v\n", " Minv*v\n", "```\n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "SageMath 9.1", "language": "sage", "name": "sagemath" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }